Corpus linguistics for low-density varieties. Minority languages and corpus-based morphological investigations
نویسندگان
چکیده
Corpus linguistics grew up in the domain of written (and literary) varieties, while its recent methodological revolution is due to computer-assisted capacity elaborating massive amounts text data. On other hand, so-called ‘low-density varieties’, including spoken varieties as well minority communities, have been confined a rather marginal role. Among others, this technical problems connected scarce degree normalization linguistic –including graphemic– terms, scarcity language resources for automatic processing. In paper, we will exploit possibilities opened by corpus acquiring and analyzing textual patrimony Walser German communities Piedmont Aosta Valley. The Highest Alemannic there, dramatically exposed decay, provide limited but significant amount data, which accompanied substantial lexical documentation active collaboration speakers’ collecting compiling local dictionaries. After briefly introducing our archive discussing peculiar solutions adopted construction platform, also present corpus-based morphological investigations regarding representation verbal prefixes, clitic group, inflectional behaviour verb classes.
منابع مشابه
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملUseful statistics for corpus linguistics
• frequencies of occurrence of linguistic elements, which can be studied from two different perspectives: o how frequent are morphemes or words or patterns/constructions in (parts of) a corpus? This information can be provided in various different forms of frequency lists; o how evenly are morphemes or words or patterns/constructions distributed across (parts of) a corpus? This information can ...
متن کاملCorpus Linguistics: Quantitative Methods
p. 1, par. 2: Until relatively recently, however, words and syntactic patterns, or constructions, were treated on a par not only theoretically, but also empirically. → It is only since recently, however, that words and syntactic patterns, or constructions, are treated on a par not only theoretically, but also empirically. p. 1, first bullet: how much does give prefer to occur in the ditransitiv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Corpus
سال: 2022
ISSN: ['1765-3126', '1638-9808']
DOI: https://doi.org/10.4000/corpus.7345